Detecting hierarchical organization of pervasive communities by modular decomposition of Markov chain

Connecting nodes that contingently co-appear, which is a common process of networking in social and biological systems, normally leads to modular structure characterized by the absence of definite boundaries. This study seeks to find and evaluate methods to detect such modules, which will be called ‘pervasive’ communities. We propose a mathematical formulation to decompose a random walk spreading over the entire network into localized random walks as a proxy for pervasive communities. We applied this formulation to biological and social as well as synthetic networks to demonstrate that it can properly detect communities as pervasively structured objects. We further addressed a question that is fundamental but has been little discussed so far: What is the hierarchical organization of pervasive communities and how can it be extracted? Here we show that hierarchical organization of pervasive communities is unveiled from finer to coarser layers through discrete phase transitions that intermittently occur as the value for a resolution-controlling parameter is quasi-statically increased. To our knowledge, this is the first elucidation of how the pervasiveness and hierarchy, both hallmarks of community structure of real-world networks, are unified.


What does MDMC decompose?
Here we demonstrate that p(n) to be decomposed by MDMC is well approximated by p st (n), the stationary-state distribution of the Markov chain. Given that the EM step leads to a stationary state, Eq. (9) in the main text is rewritten as p(n|k) = α α + π(k) N m=1 T nm p(m|k) + π(k) α + π(k) p 0 (n|k) , where p 0 (n|k) = 1 2π(k) L l=1 p st (l)r(k|l) δ n, n from l + δ n, n to l . (2) Using the property K k=1 π(k)p 0 (n|k) = p st (n), one derives If α = 0 is small relative to π(k)'s, one straightforwardly has p(n) ≈ p st (n). On the other hand, if α is relatively large, Eq. (1) can be arranged in the first-order approximation to π(k)/α as Summing these over k's and then using the exact relation (3), one obtains N m=1 T nm p(m) = p(n), from which the uniqueness of a stationary-state solution of the Markov chain, therefore, gives p(n) = p st (n). We have confirmed that for the overall range of α, p(n) ≈ p st (n) holds for a variety of networks (Fig.  4). Thus, we conclude that MDMC accurately decomposes p st (n) into modules as a proxy for communities.   Figure 1: The difference between p(n) and p st (n) is estimated by N n=1 p(n) − p st (n) for a variety of networks: top left, the karate club network; top right, the mouse whole brain networks (directed and weighted); bottom left, a benchmark network planted with pervasive communities (1000 nodes); bottom right, a Lancichinetti-Fortunato-Radicchi benchmark network (1000 nodes) [1]. In each panel, the difference averaged over 12 trials is plotted as a function of α. For any network at any value of α, the difference remains less than 0.05.

Random-walk formulations of the map equation and the modularity maximization
The map equation [2,3] and the modularity maximization [4,5,6] of resolution controlling versions can be formulated in the framework of random walk [7,8,9,10,11]. which serve as baselines in the present study. For readers' convenience, we briefly review these formulations below.

Map equation with resolution control
The map equation finds communities by searching for the most parsimonious way to describe a random walk taking place on the network. The map equation of a resolution-controlling version [7,8] can be derived from the continuous-time random-walk dynamics, which is described by the master equation The formal solution to this is where is the probability of moving from nodes m to n in an interval of the length t; T = (T nm ) and I = (δ nm ) are the transition rate matrix and the identity matrix, respectively. The map equation defines the description length for coding a random walk on the network. Suppose that a random walk is partitioned into K groups at a timescale characterized by t. According to the Shannon theorem, the map equation is given as a weighted combination of the entropies in the form where Here, q k (t) and q k (t) are the probabilities of leaving and arriving group k (= 1, · · · , K) within t, respectively, and are defined as The q (t) and q (t) are the probabilities of leaving and arriving any group within t, respectively, and are defined as Since exact calculation of Eq. (6) is infeasible especially for large networks, the following approximation is adopted [7,8]: Among all partitions, the one that minimizes the map equation (8) defines the decomposition of the network into communities. The above formulation has a single parameter t, which controls the resolution of community detection: For larger values of t, the network is decomposed into a smaller number of larger communities.

Modularity maximization with resolution control
The resolution controlling version of the modularity maximization can also be derived from the continuous-time random-walk dynamics [9,10,11]. For a given partition with K groups, the probability that a random walker remains in the same community within an interval of length t, relative to that expected under randomization in the equilibrium, is given by where g n represents the group to which node n belongs; δ(g n , g m ) = 1 if nodes n and m both belong to the same group and it is zero otherwise; p n is the steady state solution of the master equation (5). Expanding e −t(I−T) to the first-order in t yields We hence define In the case of undirected, binary networks where A nm = 1 (connected) or 0 (disconnected), T nm = A nm /k m with k m being the degree of node m, and the steady state solution is trivially given by p n = k n /2L. We therefore have When t = 1, Eq (18) gives classical Newman's modularity. Communities are sought by searching for the partition that maximizes Q, or equivalently, the relative probability (15). The only parameter t controls the resolution of community detection: For larger values of t, the network is decomposed into a smaller number of larger communities.

Louvain method
Since finding from all possible partitions the one that globally maximizes the modularity is NP hard, a heuristic approach, called the Louvain method [13], is generally used. This executes a greedy maximization of the modularity in an aggregatory manner: At first, each node is assigned to a group that has this node as a single member; then, a randomly chosen pair of neighbouring groups are aggregated to form a larger group if this aggregation increases the modularity; updating the partition by this way is continued until aggregation of any pair of neighbouring groups no more increases the modularity. Groups of the partition thus obtained are defined as communities of the network. This greedy approach by no means ensures the global maximum, yet it has exhibited high performance of community detection in comparative studies. Minimization of the map equation is also achieved by the Louvain method in the same way except that decrease, but not increase, in the evaluation function (map equation) allows aggregation of a pair of neighbouring groups.

Supplemental data of comparative experimental results
Supplemental data of results of the comparative experiment (Fig. 3 in the main text) are shown in Fig. S2.

MDMC's performance in partition detection
The map equation and the modularity maximization, even in their resolutioncontrolling versions, detect partitions, that is, boundaries of non-overlapping definite communities from networks. MDMC, originally designed to detect communities as pervasively structured objects, can also achieve definite community detection by examining arg max k p(k|n) as described in the text. We therefore evaluated definite community detection by MDMC taking the map equation and the modularity maximization of resolution controlling versions as baselines. We conducted this evaluation using a variety of networks with ground truths; for instance, Zachary's karate club [14], American College football [15], Books about US politics [16], Lancichinetti-Fortunato-Radicchi benchmark networks [1] of mixed parameters µ = 0.1, 0.3 and 0.5. The normalized mutual information (NMI) between detected communities and the ground-truth ones was calculated as a function of the resolution parameter (α for MDMC and t for the map equation and the modularity maximization) . The results shown in Fig.  3 demonstrate that MDMC exhibits the same level of performance as the map equation and the modularity maximization, which are the best or most standard methods for definite community detection.